Fast attribute-based table clustering using Predicate-Trees: A vertical data mining approach
نویسندگان
چکیده
With technological advancements, massive amount of data is being collected in various domains. For instance, since the advent of digital image technology and remote sensing imagery (RSI), NASA and U.S. Geological Survey through the Landsat Data Continuity Mission, has been capturing images of Earth down to 15 meters resolution. Likewise, consider the Internet, where, growth of social media, blog Web sites , etc. generates exponential amount of textual data on a daily basis. Since clustering of data is time-consuming, much of these data is archived even before proper analysis. In this paper, we propose two novel and extremely fast algorithms called imgFAUST or Fast Attribute-based Unsupervised and Supervised Table Clustering for images and a variation called docFAUST for textual data. Both these algorithms are based on Predicate-Trees which are compressed, lossless and data-mining-ready data structures. Without compromising much on the accuracy, our algorithms are fast and can be effectively used in high-speed image data and document analysis.
منابع مشابه
Fast Attribute-based Unsupervised and Supervised Table Clustering using P-Trees
Since the advent of digital image technology and remote sensing imagery (RSI), massive amount of image data has been collected worldwide. For example, since 1972, NASA and U.S. Geological Survey through the Landsat Data Continuity Mission, has been capturing images of Earth down to 15 meter resolution. Since image clustering is time-consuming, much of this data is archived even before analysis....
متن کاملPredicate-Tree Based Pretty Good Privacy of Data
Growth of Internet has led to exponential rise in data communication over the World Wide Web. Several applications and entities such as online banking transactions, stock trading, e-commerce Web sites, etc. are at a constant risk of eavesdropping and hacking. Hence, security of data is of prime concern. Recently, vertical data have gained lot of focus because of their significant performance be...
متن کاملEfficient Quantitative Frequent Pattern Mining Using Predicate Trees
Databases information is not limited to categorical attributes, but also contains much quantitative data. The typical approach of quantitative association rule mining is achieved by using intervals. Such approach involves many interval generations and merges, and requires reconstruction of tree structures, such as hash trees, R-trees, and FP-trees. When intervals change, these tree structures a...
متن کاملA Fuzzy C-means Algorithm for Clustering Fuzzy Data and Its Application in Clustering Incomplete Data
The fuzzy c-means clustering algorithm is a useful tool for clustering; but it is convenient only for crisp complete data. In this article, an enhancement of the algorithm is proposed which is suitable for clustering trapezoidal fuzzy data. A linear ranking function is used to define a distance for trapezoidal fuzzy data. Then, as an application, a method based on the proposed algorithm is pres...
متن کاملRetaining Customers Using Clustering and Association Rules in Insurance Industry: A Case Study
This study clusters customers and finds the characteristics of different groups in a life insurance company in order to find a way for prediction of customer behavior based on payment. The approach is to use clustering and association rules based on CRISP-DM methodology in data mining. The researcher could classify customers of each policy in three different clusters, using association rules. A...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- J. Comput. Meth. in Science and Engineering
دوره 12 شماره
صفحات -
تاریخ انتشار 2012